/tmp/<tarname>/
folder and then extracted. For each compressed tar file we generate two output files: <tarname>features.dat
for the binary features and <tarname>features.dat.csv
for the file list.
Example output file for the tar above (the path is given via the work_dir command line argument).
Note that it is possible to configure the behaviour regarding deletion of files. On default, both the downloaded tar files and the extracted images are deleted after the feature vectors are extracted. If you want to keep them locally (assuming there is large enough hard drive) you can run with :
turi_param='delete_tar=0,delete_img=0'
This keeps all the downloaded tars and images in the /tmp folder. Running example. Assume you got to the full dataset downloaded into s3://mybucket/myfolder. In total there are 40,000 tar files. Further assume you want to run using 20 compute nodes to extract the feature in parallel. In this case you cam run: